In-class_Ex5 Modeling the Spatial Variation and the Explanatory factors of Water Point Status using Geographic Weighted Logistic Regression

Author

HanShumin

Published

December 17, 2022

Setting the scene

  • To build an explanatory model to discover factor affecting water point status in Osun state, Nigeria

  • Study area: Osun State, Nigeria

  • Data sets:

  • Osun.rds, contains LGAs boundaries of Osun State. It is in sf polygon dataframe, and

  • Osun_wp_sf.rds, contained water points within Osun State. It is in sf point dataframe.

Model Variables

  • Dependent variable: water point status (ie. functional/non-functional)

  • Independent variables:

  • distance_primary_road

  • distance_to_secondary_road

  • distance_to_tertiary_road

  • distance_to_city

  • distance_to_town

  • water_point_population

  • local_population_1km

  • usage_capacity,

  • is_urban,

  • water_source_clean

Getting start

create in-class Exercise 5 folder

Write a code chunk to load the following R package: sf, tidyverse, funModeling, blorr, corrplot, ggpubr, sf, spdep, GWmodel, tmap, skimr, caret

  • sf for importing and processing geospatial data,

  • tidyverse for importing and processing non-spatial data. In this exercise, readr package will be used for importing wkt data and dplyr package will be used to wrangling the data.

  • Spatial data handling

    • sf and spdep
  • Attribute data handling

    • tidyverse
  • Choropleth mapping

    • tmap
  • Multivariate data visualisation and analysis

    • coorplot, ggpubr,
  • Classification And REgression Training

    • caret
  • Automated Reporting

    • report
  • Binary Logistic Regression Models

    • blorr
  • Geographically-Weighted Models

    • GWmodel
  • summary statistics about variables in data frames

    • skimr
pacman::p_load(rgdal, spdep, tmap, sf, 
               ggpubr, corrplot, tidyverse, funModeling, blorr, GWmodel, skimr, caret, report)

Data Import

The code chunk below load the datasets into R.

Osun <- read_rds("rds/Osun.rds")
Osun_wp_sf <- read_rds("rds/Osun_wp_sf.rds")

Thanks to Prof, the Osun and Osun_wp_sf datasets have been pre-cleaned and wrangled. In the code chunk below, freq() of funModeling package is used to display the distribution of status field in Osun_wp_sf.

Osun_wp_sf %>% 
  freq(input = "status")
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
ℹ The deprecated feature was likely used in the funModeling package.
  Please report the issue at <https://github.com/pablo14/funModeling/issues>.

  status frequency percentage cumulative_perc
1   TRUE      2642       55.5            55.5
2  FALSE      2118       44.5           100.0

From the plot we can tell Osun consists of about 55.5% of functional water points, 44.5% of non-functional water points. Unknown water points has been excluded in the dataset.

we plot the distribution of the water points by status using tmap, as shown in the code chunk below.

tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(Osun)+
  tm_polygons(alpha = 0.4) +
tm_shape(Osun_wp_sf) +
  tm_dots(col = "status",
          alpha = 0.6) +
  tm_view(set.zoom.limits = c(9,12))

Exploratory Data Analysis

Summary Statistics with skimr

Osun_wp_sf %>%
  skim()
Warning: Couldn't find skimmers for class: sfc_POINT, sfc; No user-defined `sfl`
provided. Falling back to `character`.
Data summary
Name Piped data
Number of rows 4760
Number of columns 75
_______________________
Column type frequency:
character 47
logical 5
numeric 23
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
source 0 1.00 5 44 0 2 0
report_date 0 1.00 22 22 0 42 0
status_id 0 1.00 2 7 0 3 0
water_source_clean 0 1.00 8 22 0 3 0
water_source_category 0 1.00 4 6 0 2 0
water_tech_clean 24 0.99 9 23 0 3 0
water_tech_category 24 0.99 9 15 0 2 0
facility_type 0 1.00 8 8 0 1 0
clean_country_name 0 1.00 7 7 0 1 0
clean_adm1 0 1.00 3 5 0 5 0
clean_adm2 0 1.00 3 14 0 35 0
clean_adm3 4760 0.00 NA NA 0 0 0
clean_adm4 4760 0.00 NA NA 0 0 0
installer 4760 0.00 NA NA 0 0 0
management_clean 1573 0.67 5 37 0 7 0
status_clean 0 1.00 9 32 0 7 0
pay 0 1.00 2 39 0 7 0
fecal_coliform_presence 4760 0.00 NA NA 0 0 0
subjective_quality 0 1.00 18 20 0 4 0
activity_id 4757 0.00 36 36 0 3 0
scheme_id 4760 0.00 NA NA 0 0 0
wpdx_id 0 1.00 12 12 0 4760 0
notes 0 1.00 2 96 0 3502 0
orig_lnk 4757 0.00 84 84 0 1 0
photo_lnk 41 0.99 84 84 0 4719 0
country_id 0 1.00 2 2 0 1 0
data_lnk 0 1.00 79 96 0 2 0
water_point_history 0 1.00 142 834 0 4750 0
clean_country_id 0 1.00 3 3 0 1 0
country_name 0 1.00 7 7 0 1 0
water_source 0 1.00 8 30 0 4 0
water_tech 0 1.00 5 37 0 20 0
adm2 0 1.00 3 14 0 33 0
adm3 4760 0.00 NA NA 0 0 0
management 1573 0.67 5 47 0 7 0
adm1 0 1.00 4 5 0 4 0
New Georeferenced Column 0 1.00 16 35 0 4760 0
lat_lon_deg 0 1.00 13 32 0 4760 0
public_data_source 0 1.00 84 102 0 2 0
converted 0 1.00 53 53 0 1 0
created_timestamp 0 1.00 22 22 0 2 0
updated_timestamp 0 1.00 22 22 0 2 0
Geometry 0 1.00 33 37 0 4760 0
ADM2_EN 0 1.00 3 14 0 30 0
ADM2_PCODE 0 1.00 8 8 0 30 0
ADM1_EN 0 1.00 4 4 0 1 0
ADM1_PCODE 0 1.00 5 5 0 1 0

Variable type: logical

skim_variable n_missing complete_rate mean count
rehab_year 4760 0 NaN :
rehabilitator 4760 0 NaN :
is_urban 0 1 0.39 FAL: 2884, TRU: 1876
latest_record 0 1 1.00 TRU: 4760
status 0 1 0.56 TRU: 2642, FAL: 2118

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
row_id 0 1.00 68550.48 10216.94 49601.00 66874.75 68244.50 69562.25 471319.00 ▇▁▁▁▁
lat_deg 0 1.00 7.68 0.22 7.06 7.51 7.71 7.88 8.06 ▁▂▇▇▇
lon_deg 0 1.00 4.54 0.21 4.08 4.36 4.56 4.71 5.06 ▃▆▇▇▂
install_year 1144 0.76 2008.63 6.04 1917.00 2006.00 2010.00 2013.00 2015.00 ▁▁▁▁▇
fecal_coliform_value 4760 0.00 NaN NA NA NA NA NA NA
distance_to_primary_road 0 1.00 5021.53 5648.34 0.01 719.36 2972.78 7314.73 26909.86 ▇▂▁▁▁
distance_to_secondary_road 0 1.00 3750.47 3938.63 0.15 460.90 2554.25 5791.94 19559.48 ▇▃▁▁▁
distance_to_tertiary_road 0 1.00 1259.28 1680.04 0.02 121.25 521.77 1834.42 10966.27 ▇▂▁▁▁
distance_to_city 0 1.00 16663.99 10960.82 53.05 7930.75 15030.41 24255.75 47934.34 ▇▇▆▃▁
distance_to_town 0 1.00 16726.59 12452.65 30.00 6876.92 12204.53 27739.46 44020.64 ▇▅▃▃▂
rehab_priority 2654 0.44 489.33 1658.81 0.00 7.00 91.50 376.25 29697.00 ▇▁▁▁▁
water_point_population 4 1.00 513.58 1458.92 0.00 14.00 119.00 433.25 29697.00 ▇▁▁▁▁
local_population_1km 4 1.00 2727.16 4189.46 0.00 176.00 1032.00 3717.00 36118.00 ▇▁▁▁▁
crucialness_score 798 0.83 0.26 0.28 0.00 0.07 0.15 0.35 1.00 ▇▃▁▁▁
pressure_score 798 0.83 1.46 4.16 0.00 0.12 0.41 1.24 93.69 ▇▁▁▁▁
usage_capacity 0 1.00 560.74 338.46 300.00 300.00 300.00 1000.00 1000.00 ▇▁▁▁▅
days_since_report 0 1.00 2692.69 41.92 1483.00 2688.00 2693.00 2700.00 4645.00 ▁▇▁▁▁
staleness_score 0 1.00 42.80 0.58 23.13 42.70 42.79 42.86 62.66 ▁▁▇▁▁
location_id 0 1.00 235865.49 6657.60 23741.00 230638.75 236199.50 240061.25 267454.00 ▁▁▁▁▇
cluster_size 0 1.00 1.05 0.25 1.00 1.00 1.00 1.00 4.00 ▇▁▁▁▁
lat_deg_original 4760 0.00 NaN NA NA NA NA NA NA
lon_deg_original 4760 0.00 NaN NA NA NA NA NA NA
count 0 1.00 1.00 0.00 1.00 1.00 1.00 1.00 1.00 ▁▁▇▁▁

With above statistics reports, we noticed in variable install_year, there are 25% are missing values. To impute the missing value with zero may introduce bias to the dataset hence we need to drop this variable before feed into the modeling.

The following variables are selected for modelling. To ensure there is no missing values in the dataset, we use all_var(!is.na(.)) to exclude the missing values.

Factors are used to represent categorical data. Here we will recode usage_capacity as categorical variables since it has only 3 types of values.

Osun_wp_sf_clean <- Osun_wp_sf %>%
  filter_at(vars(status,
                 distance_to_primary_road,
                 distance_to_secondary_road,
                 distance_to_tertiary_road,
                 distance_to_city,
                 distance_to_town,
                 water_point_population,
                 local_population_1km,
                 usage_capacity,
                 is_urban,
                 water_source_clean),
            all_vars(!is.na(.)))%>%
  mutate(usage_capacity = as.factor(usage_capacity))

Correlation Analysis

We will extract the desired variables from Osun_wp_sf_clean dataframe and assign to Osun_wp dataframe, also exclude st_set_geometry for following correlation matrix plot setp.

Osun_wp <- Osun_wp_sf_clean %>%
  select(c(7,35:39,42:43,46:47,57)) %>%
  st_set_geometry(NULL)
cluster_vars.cor = cor(
  Osun_wp[,2:7])
corrplot.mixed(cluster_vars.cor,
               lower = "ellipse",
               upper = "number",
               tl.pos = "lt",
               diag = "l",
               tl.col = "black")

Noticed all variables we selected are not highly correlated with each other. Therefore we do not need to remove any of them.

Building a Logistic Regression Models

We will use below code chunk to fit generalized linear model, the function is glm.

model <- glm(status ~ distance_to_primary_road +
               distance_to_secondary_road +
               distance_to_tertiary_road +
               distance_to_city +
               distance_to_town +
               is_urban +
               usage_capacity +
               water_source_clean +
               water_point_population +
               local_population_1km,
             data = Osun_wp_sf_clean,
             family = binomial(link = 'logit'))

blr_regress is Binary logistic regression.

blr_regress(model)
                             Model Overview                              
------------------------------------------------------------------------
Data Set    Resp Var    Obs.    Df. Model    Df. Residual    Convergence 
------------------------------------------------------------------------
  data       status     4756      4755           4744           TRUE     
------------------------------------------------------------------------

                    Response Summary                     
--------------------------------------------------------
Outcome        Frequency        Outcome        Frequency 
--------------------------------------------------------
   0             2114              1             2642    
--------------------------------------------------------

                                 Maximum Likelihood Estimates                                   
-----------------------------------------------------------------------------------------------
               Parameter                    DF    Estimate    Std. Error    z value     Pr(>|z|) 
-----------------------------------------------------------------------------------------------
              (Intercept)                   1      0.3887        0.1124      3.4588       5e-04 
        distance_to_primary_road            1      0.0000        0.0000     -0.7153      0.4744 
       distance_to_secondary_road           1      0.0000        0.0000     -0.5530      0.5802 
       distance_to_tertiary_road            1      1e-04         0.0000      4.6708      0.0000 
            distance_to_city                1      0.0000        0.0000     -4.7574      0.0000 
            distance_to_town                1      0.0000        0.0000     -4.9170      0.0000 
              is_urbanTRUE                  1     -0.2971        0.0819     -3.6294       3e-04 
           usage_capacity1000               1     -0.6230        0.0697     -8.9366      0.0000 
water_source_cleanProtected Shallow Well    1      0.5040        0.0857      5.8783      0.0000 
   water_source_cleanProtected Spring       1      1.2882        0.4388      2.9359      0.0033 
         water_point_population             1      -5e-04        0.0000    -11.3686      0.0000 
          local_population_1km              1      3e-04         0.0000     19.2953      0.0000 
-----------------------------------------------------------------------------------------------

 Association of Predicted Probabilities and Observed Responses  
---------------------------------------------------------------
% Concordant          0.7347          Somers' D        0.4693   
% Discordant          0.2653          Gamma            0.4693   
% Tied                0.0000          Tau-a            0.2318   
Pairs                5585188          c                0.7347   
---------------------------------------------------------------

The report above shown that P-value for variable distance_to_primary_road and distance_to_secondary_road are not statistically significant at 95% Confidence Level.

report(model)
We fitted a logistic model (estimated using ML) to predict status with
distance_to_primary_road (formula: status ~ distance_to_primary_road +
distance_to_secondary_road + distance_to_tertiary_road + distance_to_city +
distance_to_town + is_urban + usage_capacity + water_source_clean +
water_point_population + local_population_1km). The model's explanatory power
is moderate (Tjur's R2 = 0.16). The model's intercept, corresponding to
distance_to_primary_road = 0, is at 0.39 (95% CI [0.17, 0.61], p < .001).
Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with distance_to_secondary_road
(formula: status ~ distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to distance_to_secondary_road = 0,
is at 0.39 (95% CI [0.17, 0.61], p < .001). Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with distance_to_tertiary_road (formula:
status ~ distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to distance_to_tertiary_road = 0,
is at 0.39 (95% CI [0.17, 0.61], p < .001). Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with distance_to_city (formula: status ~
distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to distance_to_city = 0, is at 0.39
(95% CI [0.17, 0.61], p < .001). Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with distance_to_town (formula: status ~
distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to distance_to_town = 0, is at 0.39
(95% CI [0.17, 0.61], p < .001). Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with is_urban (formula: status ~
distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to is_urban = [?], is at 0.39 (95%
CI [0.17, 0.61], p < .001). Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with usage_capacity (formula: status ~
distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to usage_capacity = 300, is at 0.39
(95% CI [0.17, 0.61], p < .001). Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with water_source_clean (formula: status
~ distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to water_source_clean = Borehole,
is at 0.39 (95% CI [0.17, 0.61], p < .001). Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with water_point_population (formula:
status ~ distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to water_point_population = 0, is
at 0.39 (95% CI [0.17, 0.61], p < .001). Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation. and We fitted a logistic
model (estimated using ML) to predict status with local_population_1km
(formula: status ~ distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to local_population_1km = 0, is at
0.39 (95% CI [0.17, 0.61], p < .001). Within this model:

  - The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
  - The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
  - The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
  - The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation.

Below code chunk use blr_confusion_matrix to generate the binary logistic regression confusion matrix.

blr_confusion_matrix(model, cutoff = 0.5)
Confusion Matrix and Statistics 

          Reference
Prediction FALSE TRUE
         0  1301  738
         1   813 1904

                Accuracy : 0.6739 
     No Information Rate : 0.4445 

                   Kappa : 0.3373 

McNemars's Test P-Value  : 0.0602 

             Sensitivity : 0.7207 
             Specificity : 0.6154 
          Pos Pred Value : 0.7008 
          Neg Pred Value : 0.6381 
              Prevalence : 0.5555 
          Detection Rate : 0.4003 
    Detection Prevalence : 0.5713 
       Balanced Accuracy : 0.6680 
               Precision : 0.7008 
                  Recall : 0.7207 

        'Positive' Class : 1

The contingency table has 4 data cells:
1. Actual 0 Predicted 0 – The number of cases that were both predicted and
observed as 0. The records in this cell are referred to as true negatives.
The model classification was correct for these records.
2. Actual 0 Predicted 1 – The number of cases that were predicted as 1 yet
observed as 0. The records in this cell are referred to as false positives. The
model classification was incorrect for these records.
3. Actual 1 Predicted 1 – The number of cases that were both predicted and
observed as 1. The records in this cell are referred to as true positives. The
model classification was correct for these records.
4. Actual 1 Predicted 0 – The number of cases that were predicted as 0 yet
observed as 1. The records in this cell are referred to as false negatives. The
model classification was incorrect for these records.

The overall accuracy is the sum of the number of correctly classified observations
divided by the total number of observations. This not particularly useful in statistics point of view.
Sensitivity is also known as true positive rate or recall. It answers the question, “If
the model predicts a positive event, what is the probability that it really is positive?”

Specificity is the true negative rate. It answer the question, “If the model predicts a
negative event, what is the probability that it really is negative?”.

The false positive rate can be also defined as 1-specificity.

From above confusion matrix, the true positive (sensitivity) is 0.7207 and true negative (specificity) is 0.6154 which is slightly lower than true positive, it can be interpret as functional water point rate is slightly higher than non-functional wate point rate.

Geographically Weighted Regression Model

Converting sf to Spatial Point Dataframe

The code chunk below convert sf dataframe to Spatial Points Dataframe.

Osun_wp_sp <- Osun_wp_sf_clean %>%
  select(c(status,
           distance_to_primary_road,
           distance_to_secondary_road,
           distance_to_tertiary_road,
           distance_to_city,
           distance_to_town,
           water_point_population,
           local_population_1km,
           is_urban,
           usage_capacity,
           water_source_clean)) %>%
  as_Spatial()
Osun_wp_sp
class       : SpatialPointsDataFrame 
features    : 4756 
extent      : 182502.4, 290751, 340054.1, 450905.3  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 
variables   : 11
names       : status, distance_to_primary_road, distance_to_secondary_road, distance_to_tertiary_road, distance_to_city, distance_to_town, water_point_population, local_population_1km, is_urban, usage_capacity, water_source_clean 
min values  :      0,        0.014461356813335,          0.152195902540837,         0.017815121653488, 53.0461399623541, 30.0019777713073,                      0,                    0,        0,           1000,           Borehole 
max values  :      1,         26909.8616132094,           19559.4793799085,          10966.2705628969,  47934.343603562, 44020.6393368124,                  29697,                36118,        1,            300,   Protected Spring 

Computing Fixed Bandwidth

bw.fixed <- bw.ggwr(status ~ distance_to_primary_road +
                     distance_to_secondary_road +
                     distance_to_tertiary_road +
                     distance_to_city +
                     distance_to_town +
                     is_urban +
                     usage_capacity +
                     water_source_clean +
                     water_point_population +
                     local_population_1km,
                   data = Osun_wp_sp,
                   family = "binomial",
                   approach = "AIC",
                   kernel = "gaussian",
                   adaptive = FALSE,
                   longlat = FALSE)
Take a cup of tea and have a break, it will take a few minutes.
          -----A kind suggestion from GWmodel development group
 Iteration    Log-Likelihood:(With bandwidth:  95768.67 )
=========================
       0        -2889 
       1        -2836 
       2        -2830 
       3        -2829 
       4        -2829 
       5        -2829 
Fixed bandwidth: 95768.67 AICc value: 5684.357 
 Iteration    Log-Likelihood:(With bandwidth:  59200.13 )
=========================
       0        -2875 
       1        -2818 
       2        -2810 
       3        -2808 
       4        -2808 
       5        -2808 
Fixed bandwidth: 59200.13 AICc value: 5646.785 
 Iteration    Log-Likelihood:(With bandwidth:  36599.53 )
=========================
       0        -2847 
       1        -2781 
       2        -2768 
       3        -2765 
       4        -2765 
       5        -2765 
       6        -2765 
Fixed bandwidth: 36599.53 AICc value: 5575.148 
 Iteration    Log-Likelihood:(With bandwidth:  22631.59 )
=========================
       0        -2798 
       1        -2719 
       2        -2698 
       3        -2693 
       4        -2693 
       5        -2693 
       6        -2693 
Fixed bandwidth: 22631.59 AICc value: 5466.883 
 Iteration    Log-Likelihood:(With bandwidth:  13998.93 )
=========================
       0        -2720 
       1        -2622 
       2        -2590 
       3        -2581 
       4        -2580 
       5        -2580 
       6        -2580 
       7        -2580 
Fixed bandwidth: 13998.93 AICc value: 5324.578 
 Iteration    Log-Likelihood:(With bandwidth:  8663.649 )
=========================
       0        -2601 
       1        -2476 
       2        -2431 
       3        -2419 
       4        -2417 
       5        -2417 
       6        -2417 
       7        -2417 
Fixed bandwidth: 8663.649 AICc value: 5163.61 
 Iteration    Log-Likelihood:(With bandwidth:  5366.266 )
=========================
       0        -2436 
       1        -2268 
       2        -2194 
       3        -2167 
       4        -2161 
       5        -2161 
       6        -2161 
       7        -2161 
       8        -2161 
       9        -2161 
Fixed bandwidth: 5366.266 AICc value: 4990.587 
 Iteration    Log-Likelihood:(With bandwidth:  3328.371 )
=========================
       0        -2157 
       1        -1922 
       2        -1802 
       3        -1739 
       4        -1713 
       5        -1713 
Fixed bandwidth: 3328.371 AICc value: 4798.288 
 Iteration    Log-Likelihood:(With bandwidth:  2068.882 )
=========================
       0        -1751 
       1        -1421 
       2        -1238 
       3        -1133 
       4        -1084 
       5        -1084 
Fixed bandwidth: 2068.882 AICc value: 4837.017 
 Iteration    Log-Likelihood:(With bandwidth:  4106.777 )
=========================
       0        -2297 
       1        -2095 
       2        -1997 
       3        -1951 
       4        -1938 
       5        -1936 
       6        -1936 
       7        -1936 
       8        -1936 
Fixed bandwidth: 4106.777 AICc value: 4873.161 
 Iteration    Log-Likelihood:(With bandwidth:  2847.289 )
=========================
       0        -2036 
       1        -1771 
       2        -1633 
       3        -1558 
       4        -1525 
       5        -1525 
Fixed bandwidth: 2847.289 AICc value: 4768.192 
 Iteration    Log-Likelihood:(With bandwidth:  2549.964 )
=========================
       0        -1941 
       1        -1655 
       2        -1503 
       3        -1417 
       4        -1378 
       5        -1378 
Fixed bandwidth: 2549.964 AICc value: 4762.212 
 Iteration    Log-Likelihood:(With bandwidth:  2366.207 )
=========================
       0        -1874 
       1        -1573 
       2        -1410 
       3        -1316 
       4        -1274 
       5        -1274 
Fixed bandwidth: 2366.207 AICc value: 4773.081 
 Iteration    Log-Likelihood:(With bandwidth:  2663.532 )
=========================
       0        -1979 
       1        -1702 
       2        -1555 
       3        -1474 
       4        -1438 
       5        -1438 
Fixed bandwidth: 2663.532 AICc value: 4762.568 
 Iteration    Log-Likelihood:(With bandwidth:  2479.775 )
=========================
       0        -1917 
       1        -1625 
       2        -1468 
       3        -1380 
       4        -1339 
       5        -1339 
Fixed bandwidth: 2479.775 AICc value: 4764.294 
 Iteration    Log-Likelihood:(With bandwidth:  2593.343 )
=========================
       0        -1956 
       1        -1674 
       2        -1523 
       3        -1439 
       4        -1401 
       5        -1401 
Fixed bandwidth: 2593.343 AICc value: 4761.813 
 Iteration    Log-Likelihood:(With bandwidth:  2620.153 )
=========================
       0        -1965 
       1        -1685 
       2        -1536 
       3        -1453 
       4        -1415 
       5        -1415 
Fixed bandwidth: 2620.153 AICc value: 4761.89 
 Iteration    Log-Likelihood:(With bandwidth:  2576.774 )
=========================
       0        -1950 
       1        -1667 
       2        -1515 
       3        -1431 
       4        -1393 
       5        -1393 
Fixed bandwidth: 2576.774 AICc value: 4761.889 
 Iteration    Log-Likelihood:(With bandwidth:  2603.584 )
=========================
       0        -1960 
       1        -1678 
       2        -1528 
       3        -1445 
       4        -1407 
       5        -1407 
Fixed bandwidth: 2603.584 AICc value: 4761.813 
 Iteration    Log-Likelihood:(With bandwidth:  2609.913 )
=========================
       0        -1962 
       1        -1680 
       2        -1531 
       3        -1448 
       4        -1410 
       5        -1410 
Fixed bandwidth: 2609.913 AICc value: 4761.831 
 Iteration    Log-Likelihood:(With bandwidth:  2599.672 )
=========================
       0        -1958 
       1        -1676 
       2        -1526 
       3        -1443 
       4        -1405 
       5        -1405 
Fixed bandwidth: 2599.672 AICc value: 4761.809 
 Iteration    Log-Likelihood:(With bandwidth:  2597.255 )
=========================
       0        -1957 
       1        -1675 
       2        -1525 
       3        -1441 
       4        -1403 
       5        -1403 
Fixed bandwidth: 2597.255 AICc value: 4761.809 
bw.fixed
[1] 2599.672

From the above, we can see that the bandwidth calculated is 2599.672.

Building GWR Model (Fixed Bandwidth)

gwlr.fixed <- ggwr.basic(status ~
                      distance_to_primary_road +
                      distance_to_secondary_road +
                      distance_to_tertiary_road +
                      distance_to_city +
                      distance_to_town +
                      water_point_population +
                      local_population_1km +
                      is_urban +
                      usage_capacity +
                      water_source_clean,
                    data=Osun_wp_sp,
                    bw = bw.fixed,
                    family = "binomial",
                    kernel = "gaussian",
                    adaptive = FALSE,
                    longlat = FALSE)
 Iteration    Log-Likelihood
=========================
       0        -1958 
       1        -1676 
       2        -1526 
       3        -1443 
       4        -1405 
       5        -1405 
gwlr.fixed
   ***********************************************************************
   *                       Package   GWmodel                             *
   ***********************************************************************
   Program starts at: 2022-12-17 23:15:22 
   Call:
   ggwr.basic(formula = status ~ distance_to_primary_road + distance_to_secondary_road + 
    distance_to_tertiary_road + distance_to_city + distance_to_town + 
    water_point_population + local_population_1km + is_urban + 
    usage_capacity + water_source_clean, data = Osun_wp_sp, bw = bw.fixed, 
    family = "binomial", kernel = "gaussian", adaptive = FALSE, 
    longlat = FALSE)

   Dependent (y) variable:  status
   Independent variables:  distance_to_primary_road distance_to_secondary_road distance_to_tertiary_road distance_to_city distance_to_town water_point_population local_population_1km is_urban usage_capacity water_source_clean
   Number of data points: 4756
   Used family: binomial
   ***********************************************************************
   *              Results of Generalized linear Regression               *
   ***********************************************************************

Call:
NULL

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-124.555    -1.755     1.072     1.742    34.333  

Coefficients:
                                           Estimate Std. Error z value Pr(>|z|)
Intercept                                 3.887e-01  1.124e-01   3.459 0.000543
distance_to_primary_road                 -4.642e-06  6.490e-06  -0.715 0.474422
distance_to_secondary_road               -5.143e-06  9.299e-06  -0.553 0.580230
distance_to_tertiary_road                 9.683e-05  2.073e-05   4.671 3.00e-06
distance_to_city                         -1.686e-05  3.544e-06  -4.757 1.96e-06
distance_to_town                         -1.480e-05  3.009e-06  -4.917 8.79e-07
water_point_population                   -5.097e-04  4.484e-05 -11.369  < 2e-16
local_population_1km                      3.451e-04  1.788e-05  19.295  < 2e-16
is_urbanTRUE                             -2.971e-01  8.185e-02  -3.629 0.000284
usage_capacity1000                       -6.230e-01  6.972e-02  -8.937  < 2e-16
water_source_cleanProtected Shallow Well  5.040e-01  8.574e-02   5.878 4.14e-09
water_source_cleanProtected Spring        1.288e+00  4.388e-01   2.936 0.003325
                                            
Intercept                                ***
distance_to_primary_road                    
distance_to_secondary_road                  
distance_to_tertiary_road                ***
distance_to_city                         ***
distance_to_town                         ***
water_point_population                   ***
local_population_1km                     ***
is_urbanTRUE                             ***
usage_capacity1000                       ***
water_source_cleanProtected Shallow Well ***
water_source_cleanProtected Spring       ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 6534.5  on 4755  degrees of freedom
Residual deviance: 5688.0  on 4744  degrees of freedom
AIC: 5712

Number of Fisher Scoring iterations: 5


 AICc:  5712.099
 Pseudo R-square value:  0.1295351
   ***********************************************************************
   *          Results of Geographically Weighted Regression              *
   ***********************************************************************

   *********************Model calibration information*********************
   Kernel function: gaussian 
   Fixed bandwidth: 2599.672 
   Regression points: the same locations as observations are used.
   Distance metric: A distance matrix is specified for this model calibration.

   ************Summary of Generalized GWR coefficient estimates:**********
                                                   Min.     1st Qu.      Median
   Intercept                                -8.7228e+02 -4.9955e+00  1.7600e+00
   distance_to_primary_road                 -1.9389e-02 -4.8031e-04  2.9618e-05
   distance_to_secondary_road               -1.5921e-02 -3.7551e-04  1.2317e-04
   distance_to_tertiary_road                -1.5618e-02 -4.2368e-04  7.6179e-05
   distance_to_city                         -1.8416e-02 -5.6217e-04 -1.2726e-04
   distance_to_town                         -2.2411e-02 -5.7283e-04 -1.5155e-04
   water_point_population                   -5.2208e-02 -2.2767e-03 -9.8875e-04
   local_population_1km                     -1.2698e-01  4.9952e-04  1.0638e-03
   is_urbanTRUE                             -1.9790e+02 -4.2908e+00 -1.6864e+00
   usage_capacity1000                       -2.0772e+01 -9.7231e-01 -4.1592e-01
   water_source_cleanProtected.Shallow.Well -2.0789e+01 -4.5190e-01  5.3340e-01
   water_source_cleanProtected.Spring       -5.2235e+02 -5.5977e+00  2.5441e+00
                                                3rd Qu.      Max.
   Intercept                                 1.2763e+01 1073.2154
   distance_to_primary_road                  4.8443e-04    0.0142
   distance_to_secondary_road                6.0692e-04    0.0258
   distance_to_tertiary_road                 6.6814e-04    0.0128
   distance_to_city                          2.3718e-04    0.0150
   distance_to_town                          1.9271e-04    0.0224
   water_point_population                    5.0102e-04    0.1309
   local_population_1km                      1.8157e-03    0.0392
   is_urbanTRUE                              1.2841e+00  744.3097
   usage_capacity1000                        3.0322e-01    5.9281
   water_source_cleanProtected.Shallow.Well  1.7849e+00   67.6343
   water_source_cleanProtected.Spring        6.7663e+00  317.4123
   ************************Diagnostic information*************************
   Number of data points: 4756 
   GW Deviance: 2795.084 
   AIC : 4414.606 
   AICc : 4747.423 
   Pseudo R-square value:  0.5722559 

   ***********************************************************************
   Program stops at: 2022-12-17 23:16:05 

AIC for Generalized linear Regression model is 5712 compare with AIC of Geographically Weighted Regression which is 4414.606, so we can say Geographically Weighted Regression model performance is better than Generalized linear Regression model.

Converting SDF into sf dataframe

To access the performance of the gwLR, firstly, we will convert the SDF object in as dataframe by using the code chunk below.

gwr.fixed <- as.data.frame(gwlr.fixed$SDF)

next, we will label yhat values greater or equal to 0.5 into 1 and else 0. The result of the logic comparison operation will be saved into a field called most.

gwr.fixed <- gwr.fixed %>%
  mutate(most = ifelse(
    gwr.fixed$yhat >= 0.5, T, F))

confusionMatrix is using to show the confusion matrix of GW model using fixed bandwidth.

gwr.fixed$y <- as.factor(gwr.fixed$y)
gwr.fixed$most <- as.factor(gwr.fixed$most)
CM <- confusionMatrix(data=gwr.fixed$most, reference = gwr.fixed$y, positive = "TRUE")
CM
Confusion Matrix and Statistics

          Reference
Prediction FALSE TRUE
     FALSE  1824  263
     TRUE    290 2379
                                          
               Accuracy : 0.8837          
                 95% CI : (0.8743, 0.8927)
    No Information Rate : 0.5555          
    P-Value [Acc > NIR] : <2e-16          
                                          
                  Kappa : 0.7642          
                                          
 Mcnemar's Test P-Value : 0.2689          
                                          
            Sensitivity : 0.9005          
            Specificity : 0.8628          
         Pos Pred Value : 0.8913          
         Neg Pred Value : 0.8740          
             Prevalence : 0.5555          
         Detection Rate : 0.5002          
   Detection Prevalence : 0.5612          
      Balanced Accuracy : 0.8816          
                                          
       'Positive' Class : TRUE            
                                          

We can see from the CM result, True Positive rate improved for gwLR model.

Visulising gwLR

Osun_wp_sf_selected <- Osun_wp_sf_clean %>%
  select(c(ADM2_EN, ADM2_PCODE,
           ADM1_EN, ADM1_PCODE,
           status))
gwr_sf.fixed <- cbind(Osun_wp_sf_selected, gwr.fixed)
tmap_mode("view")
tmap mode set to interactive viewing
prob_T <- tm_shape(Osun) +
  tm_polygons(alpha = 0.1) +
tm_shape(gwr_sf.fixed) +
  tm_dots(col = "yhat",
          border.col = "gray60",
          border.lwd = 1) +
  tm_view(set.zoom.limits = c(8,14))
prob_T

Building a Logistic Regression Models with removal of Statistically insingnificant variables

In the earlier report of binary logistic regression model, we saw two variables distance_to_primary_road and distance_to_secondary_road are not statistically significant at 95% Confidence Level. Therefore we should exclude these two variables when build the logistic regression model.

Below code chunk repeat above steps for rebuild the model for new fixed bandwidth without above two variables.

model_re <- glm(status ~ distance_to_tertiary_road +
               distance_to_city +
               distance_to_town +
               is_urban +
               usage_capacity +
               water_source_clean +
               water_point_population +
               local_population_1km,
             data = Osun_wp_sf_clean,
             family = binomial(link = 'logit'))
blr_regress(model_re)
                             Model Overview                              
------------------------------------------------------------------------
Data Set    Resp Var    Obs.    Df. Model    Df. Residual    Convergence 
------------------------------------------------------------------------
  data       status     4756      4755           4746           TRUE     
------------------------------------------------------------------------

                    Response Summary                     
--------------------------------------------------------
Outcome        Frequency        Outcome        Frequency 
--------------------------------------------------------
   0             2114              1             2642    
--------------------------------------------------------

                                 Maximum Likelihood Estimates                                   
-----------------------------------------------------------------------------------------------
               Parameter                    DF    Estimate    Std. Error    z value     Pr(>|z|) 
-----------------------------------------------------------------------------------------------
              (Intercept)                   1      0.3540        0.1055      3.3541       8e-04 
       distance_to_tertiary_road            1      1e-04         0.0000      4.9096      0.0000 
            distance_to_city                1      0.0000        0.0000     -5.2022      0.0000 
            distance_to_town                1      0.0000        0.0000     -5.4660      0.0000 
              is_urbanTRUE                  1     -0.2667        0.0747     -3.5690       4e-04 
           usage_capacity1000               1     -0.6206        0.0697     -8.9081      0.0000 
water_source_cleanProtected Shallow Well    1      0.4947        0.0850      5.8228      0.0000 
   water_source_cleanProtected Spring       1      1.2790        0.4384      2.9174      0.0035 
         water_point_population             1      -5e-04        0.0000    -11.3902      0.0000 
          local_population_1km              1      3e-04         0.0000     19.4069      0.0000 
-----------------------------------------------------------------------------------------------

 Association of Predicted Probabilities and Observed Responses  
---------------------------------------------------------------
% Concordant          0.7349          Somers' D        0.4697   
% Discordant          0.2651          Gamma            0.4697   
% Tied                0.0000          Tau-a            0.2320   
Pairs                5585188          c                0.7349   
---------------------------------------------------------------
report(model_re)
We fitted a logistic model (estimated using ML) to predict status with
distance_to_tertiary_road (formula: status ~ distance_to_tertiary_road +
distance_to_city + distance_to_town + is_urban + usage_capacity +
water_source_clean + water_point_population + local_population_1km). The
model's explanatory power is moderate (Tjur's R2 = 0.16). The model's
intercept, corresponding to distance_to_tertiary_road = 0, is at 0.35 (95% CI
[0.15, 0.56], p < .001). Within this model:

  - The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
  - The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
  - The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with distance_to_city (formula: status ~
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to distance_to_city = 0, is at 0.35
(95% CI [0.15, 0.56], p < .001). Within this model:

  - The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
  - The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
  - The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with distance_to_town (formula: status ~
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to distance_to_town = 0, is at 0.35
(95% CI [0.15, 0.56], p < .001). Within this model:

  - The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
  - The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
  - The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with is_urban (formula: status ~
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to is_urban = [?], is at 0.35 (95%
CI [0.15, 0.56], p < .001). Within this model:

  - The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
  - The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
  - The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with usage_capacity (formula: status ~
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to usage_capacity = 300, is at 0.35
(95% CI [0.15, 0.56], p < .001). Within this model:

  - The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
  - The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
  - The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with water_source_clean (formula: status
~ distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to water_source_clean = Borehole,
is at 0.35 (95% CI [0.15, 0.56], p < .001). Within this model:

  - The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
  - The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
  - The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with water_point_population (formula:
status ~ distance_to_tertiary_road + distance_to_city + distance_to_town +
is_urban + usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to water_point_population = 0, is
at 0.35 (95% CI [0.15, 0.56], p < .001). Within this model:

  - The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
  - The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
  - The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation. and We fitted a logistic
model (estimated using ML) to predict status with local_population_1km
(formula: status ~ distance_to_tertiary_road + distance_to_city +
distance_to_town + is_urban + usage_capacity + water_source_clean +
water_point_population + local_population_1km). The model's explanatory power
is moderate (Tjur's R2 = 0.16). The model's intercept, corresponding to
local_population_1km = 0, is at 0.35 (95% CI [0.15, 0.56], p < .001). Within
this model:

  - The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
  - The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
  - The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
  - The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
  - The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
  - The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
  - The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
  - The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
  - The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation.
blr_confusion_matrix(model_re, cutoff = 0.5)
Confusion Matrix and Statistics 

          Reference
Prediction FALSE TRUE
         0  1300  743
         1   814 1899

                Accuracy : 0.6726 
     No Information Rate : 0.4445 

                   Kappa : 0.3348 

McNemars's Test P-Value  : 0.0761 

             Sensitivity : 0.7188 
             Specificity : 0.6149 
          Pos Pred Value : 0.7000 
          Neg Pred Value : 0.6363 
              Prevalence : 0.5555 
          Detection Rate : 0.3993 
    Detection Prevalence : 0.5704 
       Balanced Accuracy : 0.6669 
               Precision : 0.7000 
                  Recall : 0.7188 

        'Positive' Class : 1

Computing Fixed Bandwidth

bw.fixed_re <- bw.ggwr(status ~ distance_to_tertiary_road +
                     distance_to_city +
                     distance_to_town +
                     is_urban +
                     usage_capacity +
                     water_source_clean +
                     water_point_population +
                     local_population_1km,
                   data = Osun_wp_sp,
                   family = "binomial",
                   approach = "AIC",
                   kernel = "gaussian",
                   adaptive = FALSE,
                   longlat = FALSE)
Take a cup of tea and have a break, it will take a few minutes.
          -----A kind suggestion from GWmodel development group
 Iteration    Log-Likelihood:(With bandwidth:  95768.67 )
=========================
       0        -2890 
       1        -2837 
       2        -2830 
       3        -2829 
       4        -2829 
       5        -2829 
Fixed bandwidth: 95768.67 AICc value: 5681.18 
 Iteration    Log-Likelihood:(With bandwidth:  59200.13 )
=========================
       0        -2878 
       1        -2820 
       2        -2812 
       3        -2810 
       4        -2810 
       5        -2810 
Fixed bandwidth: 59200.13 AICc value: 5645.901 
 Iteration    Log-Likelihood:(With bandwidth:  36599.53 )
=========================
       0        -2854 
       1        -2790 
       2        -2777 
       3        -2774 
       4        -2774 
       5        -2774 
       6        -2774 
Fixed bandwidth: 36599.53 AICc value: 5585.354 
 Iteration    Log-Likelihood:(With bandwidth:  22631.59 )
=========================
       0        -2810 
       1        -2732 
       2        -2711 
       3        -2707 
       4        -2707 
       5        -2707 
       6        -2707 
Fixed bandwidth: 22631.59 AICc value: 5481.877 
 Iteration    Log-Likelihood:(With bandwidth:  13998.93 )
=========================
       0        -2732 
       1        -2635 
       2        -2604 
       3        -2597 
       4        -2596 
       5        -2596 
       6        -2596 
Fixed bandwidth: 13998.93 AICc value: 5333.718 
 Iteration    Log-Likelihood:(With bandwidth:  8663.649 )
=========================
       0        -2624 
       1        -2502 
       2        -2459 
       3        -2447 
       4        -2446 
       5        -2446 
       6        -2446 
       7        -2446 
Fixed bandwidth: 8663.649 AICc value: 5178.493 
 Iteration    Log-Likelihood:(With bandwidth:  5366.266 )
=========================
       0        -2478 
       1        -2319 
       2        -2250 
       3        -2225 
       4        -2219 
       5        -2219 
       6        -2220 
       7        -2220 
       8        -2220 
       9        -2220 
Fixed bandwidth: 5366.266 AICc value: 5022.016 
 Iteration    Log-Likelihood:(With bandwidth:  3328.371 )
=========================
       0        -2222 
       1        -2002 
       2        -1894 
       3        -1838 
       4        -1818 
       5        -1814 
       6        -1814 
Fixed bandwidth: 3328.371 AICc value: 4827.587 
 Iteration    Log-Likelihood:(With bandwidth:  2068.882 )
=========================
       0        -1837 
       1        -1528 
       2        -1357 
       3        -1261 
       4        -1222 
       5        -1222 
Fixed bandwidth: 2068.882 AICc value: 4772.046 
 Iteration    Log-Likelihood:(With bandwidth:  1290.476 )
=========================
       0        -1403 
       1        -1016 
       2       -807.3 
       3       -680.2 
       4       -680.2 
Fixed bandwidth: 1290.476 AICc value: 5809.719 
 Iteration    Log-Likelihood:(With bandwidth:  2549.964 )
=========================
       0        -2019 
       1        -1753 
       2        -1614 
       3        -1538 
       4        -1506 
       5        -1506 
Fixed bandwidth: 2549.964 AICc value: 4764.056 
 Iteration    Log-Likelihood:(With bandwidth:  2847.289 )
=========================
       0        -2108 
       1        -1862 
       2        -1736 
       3        -1670 
       4        -1644 
       5        -1644 
Fixed bandwidth: 2847.289 AICc value: 4791.834 
 Iteration    Log-Likelihood:(With bandwidth:  2366.207 )
=========================
       0        -1955 
       1        -1675 
       2        -1525 
       3        -1441 
       4        -1407 
       5        -1407 
Fixed bandwidth: 2366.207 AICc value: 4755.524 
 Iteration    Log-Likelihood:(With bandwidth:  2252.639 )
=========================
       0        -1913 
       1        -1623 
       2        -1465 
       3        -1376 
       4        -1341 
       5        -1341 
Fixed bandwidth: 2252.639 AICc value: 4759.188 
 Iteration    Log-Likelihood:(With bandwidth:  2436.396 )
=========================
       0        -1980 
       1        -1706 
       2        -1560 
       3        -1479 
       4        -1446 
       5        -1446 
Fixed bandwidth: 2436.396 AICc value: 4756.675 
 Iteration    Log-Likelihood:(With bandwidth:  2322.828 )
=========================
       0        -1940 
       1        -1656 
       2        -1503 
       3        -1417 
       4        -1382 
       5        -1382 
Fixed bandwidth: 2322.828 AICc value: 4756.471 
 Iteration    Log-Likelihood:(With bandwidth:  2393.017 )
=========================
       0        -1965 
       1        -1687 
       2        -1539 
       3        -1456 
       4        -1422 
       5        -1422 
Fixed bandwidth: 2393.017 AICc value: 4755.57 
 Iteration    Log-Likelihood:(With bandwidth:  2349.638 )
=========================
       0        -1949 
       1        -1668 
       2        -1517 
       3        -1432 
       4        -1398 
       5        -1398 
Fixed bandwidth: 2349.638 AICc value: 4755.753 
 Iteration    Log-Likelihood:(With bandwidth:  2376.448 )
=========================
       0        -1959 
       1        -1680 
       2        -1530 
       3        -1447 
       4        -1413 
       5        -1413 
Fixed bandwidth: 2376.448 AICc value: 4755.48 
 Iteration    Log-Likelihood:(With bandwidth:  2382.777 )
=========================
       0        -1961 
       1        -1683 
       2        -1534 
       3        -1450 
       4        -1416 
       5        -1416 
Fixed bandwidth: 2382.777 AICc value: 4755.491 
 Iteration    Log-Likelihood:(With bandwidth:  2372.536 )
=========================
       0        -1958 
       1        -1678 
       2        -1528 
       3        -1445 
       4        -1411 
       5        -1411 
Fixed bandwidth: 2372.536 AICc value: 4755.488 
 Iteration    Log-Likelihood:(With bandwidth:  2378.865 )
=========================
       0        -1960 
       1        -1681 
       2        -1532 
       3        -1448 
       4        -1414 
       5        -1414 
Fixed bandwidth: 2378.865 AICc value: 4755.481 
 Iteration    Log-Likelihood:(With bandwidth:  2374.954 )
=========================
       0        -1959 
       1        -1679 
       2        -1530 
       3        -1446 
       4        -1412 
       5        -1412 
Fixed bandwidth: 2374.954 AICc value: 4755.482 
 Iteration    Log-Likelihood:(With bandwidth:  2377.371 )
=========================
       0        -1959 
       1        -1680 
       2        -1531 
       3        -1447 
       4        -1413 
       5        -1413 
Fixed bandwidth: 2377.371 AICc value: 4755.48 
 Iteration    Log-Likelihood:(With bandwidth:  2377.942 )
=========================
       0        -1960 
       1        -1680 
       2        -1531 
       3        -1448 
       4        -1414 
       5        -1414 
Fixed bandwidth: 2377.942 AICc value: 4755.48 
 Iteration    Log-Likelihood:(With bandwidth:  2377.018 )
=========================
       0        -1959 
       1        -1680 
       2        -1531 
       3        -1447 
       4        -1413 
       5        -1413 
Fixed bandwidth: 2377.018 AICc value: 4755.48 
bw.fixed_re
[1] 2377.371

From the above, we can see that the new bandwidth calculated is 2377.371.

Building GWR Model (Fixed Bandwidth)

gwlr.fixed_re <- ggwr.basic(status ~distance_to_tertiary_road +
                      distance_to_city +
                      distance_to_town +
                      water_point_population +
                      local_population_1km +
                      is_urban +
                      usage_capacity +
                      water_source_clean,
                    data=Osun_wp_sp,
                    bw = bw.fixed_re,
                    family = "binomial",
                    kernel = "gaussian",
                    adaptive = FALSE,
                    longlat = FALSE)
 Iteration    Log-Likelihood
=========================
       0        -1959 
       1        -1680 
       2        -1531 
       3        -1447 
       4        -1413 
       5        -1413 
gwlr.fixed_re
   ***********************************************************************
   *                       Package   GWmodel                             *
   ***********************************************************************
   Program starts at: 2022-12-17 23:27:11 
   Call:
   ggwr.basic(formula = status ~ distance_to_tertiary_road + distance_to_city + 
    distance_to_town + water_point_population + local_population_1km + 
    is_urban + usage_capacity + water_source_clean, data = Osun_wp_sp, 
    bw = bw.fixed_re, family = "binomial", kernel = "gaussian", 
    adaptive = FALSE, longlat = FALSE)

   Dependent (y) variable:  status
   Independent variables:  distance_to_tertiary_road distance_to_city distance_to_town water_point_population local_population_1km is_urban usage_capacity water_source_clean
   Number of data points: 4756
   Used family: binomial
   ***********************************************************************
   *              Results of Generalized linear Regression               *
   ***********************************************************************

Call:
NULL

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-129.368    -1.750     1.074     1.742    34.126  

Coefficients:
                                           Estimate Std. Error z value Pr(>|z|)
Intercept                                 3.540e-01  1.055e-01   3.354 0.000796
distance_to_tertiary_road                 1.001e-04  2.040e-05   4.910 9.13e-07
distance_to_city                         -1.764e-05  3.391e-06  -5.202 1.97e-07
distance_to_town                         -1.544e-05  2.825e-06  -5.466 4.60e-08
water_point_population                   -5.098e-04  4.476e-05 -11.390  < 2e-16
local_population_1km                      3.452e-04  1.779e-05  19.407  < 2e-16
is_urbanTRUE                             -2.667e-01  7.474e-02  -3.569 0.000358
usage_capacity1000                       -6.206e-01  6.966e-02  -8.908  < 2e-16
water_source_cleanProtected Shallow Well  4.947e-01  8.496e-02   5.823 5.79e-09
water_source_cleanProtected Spring        1.279e+00  4.384e-01   2.917 0.003530
                                            
Intercept                                ***
distance_to_tertiary_road                ***
distance_to_city                         ***
distance_to_town                         ***
water_point_population                   ***
local_population_1km                     ***
is_urbanTRUE                             ***
usage_capacity1000                       ***
water_source_cleanProtected Shallow Well ***
water_source_cleanProtected Spring       ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 6534.5  on 4755  degrees of freedom
Residual deviance: 5688.9  on 4746  degrees of freedom
AIC: 5708.9

Number of Fisher Scoring iterations: 5


 AICc:  5708.923
 Pseudo R-square value:  0.129406
   ***********************************************************************
   *          Results of Geographically Weighted Regression              *
   ***********************************************************************

   *********************Model calibration information*********************
   Kernel function: gaussian 
   Fixed bandwidth: 2377.371 
   Regression points: the same locations as observations are used.
   Distance metric: A distance matrix is specified for this model calibration.

   ************Summary of Generalized GWR coefficient estimates:**********
                                                   Min.     1st Qu.      Median
   Intercept                                -3.7021e+02 -4.3797e+00  3.5590e+00
   distance_to_tertiary_road                -3.1622e-02 -4.5462e-04  9.1291e-05
   distance_to_city                         -5.4555e-02 -6.5623e-04 -1.3507e-04
   distance_to_town                         -8.6549e-03 -5.2754e-04 -1.6785e-04
   water_point_population                   -2.9696e-02 -2.2705e-03 -1.2277e-03
   local_population_1km                     -7.7730e-02  4.4281e-04  1.0548e-03
   is_urbanTRUE                             -7.3554e+02 -3.4675e+00 -1.6596e+00
   usage_capacity1000                       -5.5889e+01 -1.0347e+00 -4.1960e-01
   water_source_cleanProtected.Shallow.Well -1.8842e+02 -4.7295e-01  6.2378e-01
   water_source_cleanProtected.Spring       -1.3630e+03 -5.3436e+00  2.7714e+00
                                                3rd Qu.      Max.
   Intercept                                 1.3755e+01 2171.6373
   distance_to_tertiary_road                 6.3011e-04    0.0237
   distance_to_city                          1.5921e-04    0.0162
   distance_to_town                          2.4490e-04    0.0179
   water_point_population                    4.5879e-04    0.0765
   local_population_1km                      1.8479e-03    0.0333
   is_urbanTRUE                              1.0554e+00  995.1840
   usage_capacity1000                        3.9113e-01    9.2449
   water_source_cleanProtected.Shallow.Well  1.9564e+00   66.8914
   water_source_cleanProtected.Spring        7.0805e+00  208.3749
   ************************Diagnostic information*************************
   Number of data points: 4756 
   GW Deviance: 2815.659 
   AIC : 4418.776 
   AICc : 4744.213 
   Pseudo R-square value:  0.5691072 

   ***********************************************************************
   Program stops at: 2022-12-17 23:28:00 

AIC for Generalized linear Regression model is 5708.923 compare with AIC of Geographically Weighted Regression which is 4418.776, so we can say Geographically Weighted Regression model performance is better than Generalized linear Regression model.

Converting SDF into sf dataframe

To access the performance of the gwLR, firstly, we will convert the SDF object in as dataframe by using the code chunk below.

gwr.fixed_re <- as.data.frame(gwlr.fixed_re$SDF)
gwr.fixed_re <- gwr.fixed_re %>%
  mutate(most = ifelse(
    gwr.fixed_re$yhat >= 0.5, T, F))
gwr.fixed_re$y <- as.factor(gwr.fixed_re$y)
gwr.fixed_re$most <- as.factor(gwr.fixed_re$most)
CM_re <- confusionMatrix(data=gwr.fixed_re$most, reference = gwr.fixed_re$y, positive = "TRUE")
CM_re
Confusion Matrix and Statistics

          Reference
Prediction FALSE TRUE
     FALSE  1833  268
     TRUE    281 2374
                                          
               Accuracy : 0.8846          
                 95% CI : (0.8751, 0.8935)
    No Information Rate : 0.5555          
    P-Value [Acc > NIR] : <2e-16          
                                          
                  Kappa : 0.7661          
                                          
 Mcnemar's Test P-Value : 0.6085          
                                          
            Sensitivity : 0.8986          
            Specificity : 0.8671          
         Pos Pred Value : 0.8942          
         Neg Pred Value : 0.8724          
             Prevalence : 0.5555          
         Detection Rate : 0.4992          
   Detection Prevalence : 0.5582          
      Balanced Accuracy : 0.8828          
                                          
       'Positive' Class : TRUE            
                                          

We can see from the CM result, True Positive rate improved for gwLR model.

Visulising gwLR

gwr_sf.fixed_re <- cbind(Osun_wp_sf_selected, gwr.fixed_re)
tmap_mode("view")
tmap mode set to interactive viewing
prob_T_re <- tm_shape(Osun) +
  tm_polygons(alpha = 0.1) +
tm_shape(gwr_sf.fixed_re) +
  tm_dots(col = "yhat",
          border.col = "gray60",
          border.lwd = 1) +
  tm_view(set.zoom.limits = c(8,14))
prob_T_re

Performance Comparison

Table below shows the True positive and True negative comparison of 4 logistic regression models.

Performance

Measure

Generalized linear Regression (all variables) gwLR Model (all variables) Generalized linear Regression (exclude 2 var) gwLR Model (exclude 2 var)
TP 0.7207 0.9005 0.7188 0.8986
TN 0.6154 0.8628 0.6149 0.8671

From above comparison table, the True Positive did not improve after we exclude the two in-significant variables. However, the True Negative of gwLR model improved slightly with exclude the two variables. So we may consider to use gwLR model if we want to focus on analysing non-functional water point ratio. On the other hand, compare the results of GW regression model and generalized linear regression model, it cleared shown that the GW regression model is more robust either in AIC value or the performance measures.